Background: Accurate diagnosis of myelodysplastic syndromes (MDS) relies on hematopathologist evaluation of bone marrow (BM) specimens to identify morphologic dysplasia-a labor intensive and error-prone process. This is further complicated by the recent recognition of pre-MDS conditions such as idiopathic cytopenia of undetermined significance (ICUS) and clonal cytopenia of undetermined significance (CCUS), in which cytopenia and mild morphologic abnormalities may not meet MDS diagnostic criteria. We developed a deep learning pipeline to distinguish MDS from ICUS /CCUS using digitized bone marrow aspirate (BMA) smears (Dave et al., Blood, 2024). As BM biopsy (BMB) and BMA provide complementary information in MDS evaluation, we now present an improved deep learning model trained on digitized BMB slides for distinguishing MDS from ICUS and CCUS (n=645). Model performance was further evaluated on an external independent cohort (n = 96). This AI-assisted BM review framework has the potential to enhance diagnostic consistency, efficiency, and support clinical decision-making in myeloid neoplasms.

Methods: For BMB examination, we used a whole slide foundation model (Prov-GigaPath)-a deep learning model pretrained on millions of pathology images- to leverage the relevant pathology knowledge from the extensive pretraining. The pretrained model was adapted to the current application (fine-tuned) using the MDS Natural History Study (NHS, NCT02775383) dataset (n=645; 319 MDS, 276 CCUS, and 50 ICUS). CCUS and ICUS were combined into a ‘non-MDS’ category, based on the absence of definitive morphologic dysplasia or findings of <10% dysplasia and <5% blasts. We also validated the previously presented BMA model and newly developed BMB model on an independent institutional set of 96 patients (59 non-MDS and 37 MDS) as an external test set. This study focused solely on morphology, excluding clinical or molecular data. Performance was evaluated using AUROC, AUPRC, precision, sensitivity, specificity, and accuracy, with AUROC/AUPRC ranging from 0.5 (random) to 1.0 (perfect). The “gold standard” was central pathologist diagnosis of MDS.

Details of the BMB-based model are as follows: The giga-pixel sized whole slide images of the BMB were examined using the two-stage architecture of Prov-GigaPath. The slides were first divided into 256x256 tiles and vision features for the tiles were extracted using a Vision Transformer based tile encoder. Next, the tile features were aggregated using a LongNet based slide-encoder to obtain slide level vision features. A fully connected classification layer was trained on the slide features for discrimination of MDS from Non-MDS class.

Results: The BMB-based model achieved a 5-fold cross-validation AUROC of 0.75 ± 0.04 and an AUROC of 0.81 with 75.97% accuracy on a held-out internal validation set (n = 129) from the MDS NHS cohort. Sensitivity and specificity on this validation set were 0.75 and 0.77, respectively. On an external testing cohort (n = 96), the model achieved 77.08% accuracy, 0.89 AUROC, 0.86 AUPRC, weighted precision of 0.78, weighted sensitivity of 0.77, and specificity (negative class) of 0.76 after model calibration.

Similarly, our prior BMA-based model achieved 85.26% accuracy, 0.89 AUROC, 0.89 AUPRC, weighted precision of 0.86, weighted sensitivity of 0.85, and specificity (negative class) of 0.85 on the same external cohort. As an initial assessment of the model explainability, we generated histograms of the nucleated cell types (‘Basophil’, ‘Blast’, ‘Eosinophil’, ‘Erythroblast’, ‘Lymphocyte’, ‘Megakaryocyte’, ‘Metamyelocyte’, ‘Monocyte’, ‘Myelocyte’, ‘Neutrophil’, ‘Other cell’, ‘Plasma cell’, ‘Promyelocyte‘) contributing most to the model's decision across the runs performed via 5-fold cross-validation. In MDS cases, the model consistently focused on different cell types depending on the dysplastic lineage, whereas in non-MDS cases, it primarily relied on the most abundant cell type—e.g. neutrophils—within the BMA.

Conclusion: We present machine learning models for automated analysis of BMA and BMB to distinguish MDS from pre-MDS conditions. The models show promising performance on a multi-institutional discovery dataset and an independent external validation cohort. Future work includes validation on a larger external cohort, integration of aspirate and biopsy models for multimodal analysis, model explainability, and evaluation of high-risk vs low/intermediate-risk CCUS cases.

This content is only available as a PDF.
Sign in via your Institution